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ABSTRACT 

This  study  evaluates  the  job  relatedness  of  the  Police  Services 
Examination  given  in  Massachusetts  in  October  1975*  A  predictive 
criterion-related  validity  methodology  was  employed,  with  the  written 
examination  grade  as  the  predictor  and  performence  in  Police  Academy 
training  as  the  criterion.  Descriptive  statistics  for  the  applicant 

4 

population  of  over  20,000  persons  and  the  smaller  validation  sample 
(n=376)  are  reported.  The  written  examination  was  found  to  be  highly 
predictive  of  the  academy  grade:  unadjusted  correlation  coefficient 
r=A9*  P  <  .001;  adjusted  r=.71,  p  <  ,G©1.  This  high  degree 
of  prediction,  considered  with  the  selection  ratio  and  success  ratio, 
indicates  that  the  1975  Police  Services  Examination  has  a  high  degree 
of  utility  as  a  selection  device. 
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FOREWORD 

This  report  has  been  in  existence  in  draft  form  for  several  years. 

It  has  been  edited  and  published  at  this  time  as  part  of  our  ongoing 

) 

effort  to  develop  and  document  improved  public  safety  examinations. 
Some  of  the  references  cited  in  this  report  are  dated.  This  is  due  to 
the  delayed  publication  of  the  report.  Aside  from  this  purpose,  this 
report  is  intended  to  serve  a  heuristic  purpose,  and  therefore  has  more 
detail  In  some  areas  than  would  otherwise  be  appropriate. 
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Introduction 

1.1  Background 

In  June  of  1975  a  statewide  examination  for  Police  Officer  was  administered  to 
over  22,000  applicants.  A  large  number  of  the  cities  and  towns  rely  upon  an  eli- 
gibility list,  established  by  this  statewide  Civil  Service  written  examination,  to 
provide  a  pool  of  candidates  from  which  to  make  appointments  to  their  police  forces. 
As  in  other  states,  this  practice  has  a  history  of  over  one  hundred  years,  originating 
as  a  paradigm  developed  by  State  and  Federal  Civil  Service  reformers  in  the  l880's 
as  the  means  of  taking  "politics  out  of  Civil  Service  and  Civil  Service  out  of 
politics"  (Rosenblum  &  Oluchowski,  1977 »  P«9)«  Competitive  merit  examinations  were 
the  means  by  which  the  Civil  Service  was  to  be  expunged  of  nepotism,  cronism,  and 
favoritism  by  establishing  an  avenue  by  which  only  those  of  superior  ability  would 
be  appointed.  The  merit  system  became  the  cornerstone  of  public  personnel  adminis- 
tration; and  in  the  area  of  entry  level  positions,  written  examinations  became  the 
cornerstone  of  the  merit  system.  In  order  for  the  merit  system  to  legitimately  select 
personnel  by  reason  of  written  tests,  a  high  mark  on  the  test  must  be  a  valid  indicator 
of  the  possession  of  qualities  which  would  allow  successful  performance  on  the  job. 

The  questions  of  legitimacy  and  validity  arose  in  relation  to  the  entry  level 
Police  Officer  Examinations  administered  by  the  Civil  Service  Division  of 
Massachusetts  in  1968  and  1970.  A  class  action  suit  was  brought  against  the  Commission 
and  the  Boston  Police  Department  in  which  the  plaintiffs,  Castro  et  al. ,  charged  that 
those  particular  exams  ( 1968-1970 )  were  discriminatory  and  not  job-related.  The  finding 
of  the  U.S.  Court  of  Appeals  for  the  first  Circuit  stipulated  that  "the  public  employer 
must,  we  think,  in  order  to  justify  the  use  of  a  means  of  selection  (the  written  exam), 
shown  to  have  a  racially  disproportionate  impact,  demonstrate  that  the  means  is  in  fact 
substantially  related  to  job  performance.   It  may  not,  to  state  the  matter  another  way, 
rely  on  any  reasonable  version  of  the  facts,  but  must  come  forward  with  convincing 
facts  establishing  a  fit  between  the  qualifications  and  the  job"  (Castro  vs.  Beecher 
459  P2d  725  (1st  Civ,  1972)  p. 11).  Under  this  directive  of  the  Court,  the  Division 
of  Personnel  Administration,  which  currently  administers  the  Civil  Service  Police 
Entrance  Examinations,  is  required  to  substantiate  that  any  written  examination  it 
uses  for  the  purpose  of  establishing  eligibility  lists  be  a  valid  indicator  of  qualities 
necessary  for  performing  the  actual  duties  of  a  Police  Officer;  face  validity  was 
deemed  inadequate. 

The  court  deemed  the  practice  of  employing  tests  which  have  adverse  impact  on 
minorities  and  whose  "job  relatedness"  is  undemonstrated,  as  a  discriminatory  practice, 
unless  or  until  it  meets  the  condition  of  substantiated  job  relatedness. 

The  final  judgement  in  this  ease  (April,  1973)  required  that  written  examinations 
for  police  officer  be  validated  in  accordance  with  the  testing  guidelines  of  the 
Equal  Employment  Opportunity  Commission.  This  report  presents  a  criterion  related 
validity  study  in  compliance  with  that  requirement. 

1.2  Validity  Concepts 

In  the  area  of  testing,  particularly  for  personnel  selection,  two  issues  are  of 
prime  concern:  the  reliability  and  the  validity  of  the  selection  instrument. 


Reliability  refers  to  the  consistency  of  scores  for  individuals.   For  example, 
would  persons  score  substantially  the  3ame  if  tney  were  to  tax.e  a  similar  test.   .he 
validity  of  a  test  refers  to  whether  the  test  measures  what  it  purports  to  measure.  We 
can  identify  three  overlapping  approaches  to  validation: 

Content  validity  involves  examining  the  test  to  determine  whether  the  total 
test  truly  and  fairly  represents  the  knowledges  and  abilities  purported  to  be 
measured.  This  type  of  validation  is  often  used  for  achievement  (knowledge)  tests, 
tests  which  measure  how  well  an  individual  has  mastered  a  certain  body  of  knowledge. 
The  total  set  of  knowledge  is  first  determined  or  described.  A  test  is  then  content 
valid  if  its  items  are  a  good  sample  of  the  total  set. 

Construct  validity  refers  to  the  extent  to  which  a  test  measures  a  theoretical 
construct  or  trait;  examples  include  intelligence,  creativity  and  extroversion. 
Aptitude  tests  may  also  fall  into  this  category.   Construct  validity  depends  upon 
the  gradual  accumulation  over  time,  and  in  different  situation,  of  information 
on  the  nature  of  the  construct  and  the  conditions  which  affect  its  development. 

Criterion-related  validity  refers  to  the  comparison  of  the  test  to  a  criterion, 
i.e.  a  direct  measure  of  what  the  test  is  supposed  to  predict.  The  American 
Psychological  Association's  Test  Standards  (197*0  differentiate  between  two  methods 
of  criterion  related  validity,  concurrent  and  predictive,  based  largely  on  the  time 
interval  between  the  collection  of  the  predictor  variable  (the  test)  and  the  criterion 
variable.   In  a  concurrent  validation  study  the  test  is  administered  to  a  sample  of 
incumbents,  while  the  criterion  information  is  measured  concurrently  (e.g.  job  per- 
formance of  presently  employed  police  patrolmen).  Statistical  analysis  is  performed 
to  determine  how  well  the  test  presently  predicts  this  criterion. 

In  predictive  validation,  the  test  is  typically  administered  to  an  applicant 
group.   If  there  is  a  group  of  applicants  for  a  position,  selection  for  that  position 
is  ideally  made  without  regard  to  their  scores  on  the  test.  At  a  future  time  the 
criterion  information  is  gathered  on  a  sample  of  the  selectees,  and  the  relationship 
between  the  test  scores  and  the  criterion  is  determined.   In  many  cases,  the  best 
method  to  use  in  personnel  selection,  when  feasible,  is  predictive  criterion-related 
validity. 

Due  to  the  nature  of  the  job  of  Police  Officer,  however,  and  the  severe  conse- 
quences of  hiring  an  unqualified  officer,  it  is  not  desirable  to  hire  applicants 
without  regard  to  qualifications.  A  practical  approach  to  conducting  a  predictive 
validity  study  is  to  use  the  test  for  selection  purposes,  then  collect  criterion 
information  on  those  selected  and  validate  the  test  using  this  data.  While  this 
method  is  economically  practical  it  gives  rise  to  a  statistical  problem:   res- 
triction of  range.  This  problem  does  have  a  statistical  solution  which  will  be 
dealt  with  in  a  later  section. 

1.3   The  Criterion 

The  heart  of  any  predictive  validity  study  is  the  criterion.  Many  different 
measures  are  possible,  among  these  are:   ratings  of  job  performance  and  performance 
in  training,  and  objective  criteria,  such  as  attendance  punctuality,  conviction 
rate,  and  disciplinary  action. 


1.3  The  Criterion 

The  use  of  job  performance  as  a  criterion  entails  the  use  of  raters  or  some 
other  measurement  method  to  judge  the  performance  of  the  hires.  Not  only  is 
this  procedure  expensive,  but  also  rater  bias  and  problems  with  inter-rater  re- 
ability  can  depress  the  magnitude  and  confuse  the  interpretation  of  any  validation 
evidence  obtained.  Further  confounding  job  performance  as  a  criterion  is  the 
diversity  of  jobs  and  the  setting  in  which  a  policeman  can  be  placed.  These  could 
range,  for  example,  from  traffic  work  to  investigation,  from  slum  areas  to  areas 
of  high  socio-economic  status,  or  from  urban  to  rural  areas.  Thus  diversity  and 
room  for  error  increases  as  one  moves  from  people  taking  the  same  test  to  people 
undergoing  similar  but  not  identical  training  to  people  assigned  to  diverse  jobs. 
In  a  study  by  the  City  of  Phoenix  (1975) »  using  an  entry  level  police  examination 
in  a  predictive  validity  study,  a  strong  relationship  was  found  between  test  scores 
and  training  on  the  one  hand  (r=55)  and  a  relatively  weaker  relationship  between 
training  scores  and  job  success  on  the  other  hand  (r=0.30).  It  seems  reason- 
able here,  and  indeed  is  standard  practice  in  other  areas  (e.g.  Armed  Forces),  to 
accept  training  data  as  a  suitable  substitute  for  job  performance  in  a  predictive 
validity  study. 

1.4  Reliability 

A  necessary  but  not  sufficient  characteristic  of  a  valid  test  is  a  high  degree 
of  reliability:  "a  test  can  be  reliable  without  being  valid,  but  it  cannot  be 
valid  without  being  reliable"  (Phillips,  1973*  p.  48).  The  concept  of  reliability 
could  be  compared  to  the  scores  on  a  pistol  range  target.  If  one  assumes  the 
marksman  is  a  good  shot,  then  the  accuracy  of  the  pistol  itself  is  analogous  to 
reliability  while  the  accuracy  of  the  sight  is  analogous  to  validity.   If  the 
pistol  is  inaccurate,  it  does  not  matter  how  good  the  sight  is,  the  shots  will  be 
spread  all  over  the  target.  If  the  pistol  is  accurate,  the  shots  should  end  up 
in  a  tight  group.  However,  if  the  sight  is  off,  this  tight  group  will  be  off  the 
bullseye.  Only  if  the  pistol  is  accurate  can  one  determine  whether  or  not  the 
sight  is  accurate.  In  testing,  only  if  a  test  is  reliable  can  it  be  "on  target", 
that  is,  valid. 

A  reliable  test  puts  scores  in  a  tight  group  and  a  valid  test  means  that 
the  scores  will  be  true  indicators  of  performance.  High  reliability  reduces  the 
probability  that  the  scores  are  due  to  chance  or  luck,  and  increases  the  probability 
that  the  scores  are  due  to  skill.  It  is  obviously  always  important  to  have  a  test 
of  high  reliability. 

1.5  Other  Studies 

A  number  of  other  studies  have  been  done  to  validate  entry  level  police 
examinations,  some  with  the  same  form  as  the  test  used  in  this  study  (developed 
by  the  International  Personnel  Management  Association, IPMA  ,  Police  Officer  Form 
Al).  The  California  Selection  Consulting  Center  (CSCC,  1973)  conducted  a  con- 
current validity  study  using  the  IPMA  Form  Al,  and  determined  that  the  test  was 
valid  in  predicting  the  present  performance  of  officers  on  the  job.  The  Educational 
Testing  Service  (ETS,  1976)  performed  a  concurrent  validation  study  in  four  juris- 
dictions on  a  preliminary  test  which  was  developed  based  on  a  job  analysis.  The 
ETS  Study  found  positive  correlations  on  all  abilities  tested  for  one  or  more  job 
dimensions.  A  concurrent  validation  study  conducted  for  the  City  of  Philadelphia 
by  the  Center  for  Occupational  and  Professional  Assessment  (1974)  found  their 
own  entry  level  police  examination,  developed  by  the  City  of  Philadelphia,  to  be 
valid. 


The  present  study  evaluates  the  predictive  validity  of  the  1975  entry  level 
police  examination  (IPMA  J:Al)  for  the  Commonwealth  of  Massachusetts  using  as  the 
criterion  training  data  from  police  academics  for  those  applicants  selected.   The 
study  takes  into  account  factors  which  might  systematically  affect  the  relationship 
between  test  scores  and  academy  grades,  and  makes  an  assessment  of  the  practical 
utility  of  the  examination. 

METHODOLOGY 

2.1   Predictor 

The  3election  of  a  test  to  serve  as  predictive  instrument  is  governed  by 
several  factors.   Since  the  written  test  will  have  the  function  of  predicting  the 
adequate  and/or  superior  job  performance  as  a  Police  Officer,  it  is  important  to 
take  steps  to  ensure  job-related  test  content.   These  procedural  steps  are 
outlined  in  the  psychological  literature. 

The  foundation  of  a  written  test  constructed  as  a  predictive  instrument  is 
laid  by  conducting  a  job  analysis  of  the  position.  Areas  on  the  test  are  selected 
for  their  relation  to  those  tasks  or  knowledges,  skills,  abilities  or  personal 
characteristics  identified  by  the  job  analysis. 

For  the  1975  test,  the  Division  of  Personnel  Administration  researched  the 
existing  literature  and  selected  the  International  Personnel  Management  Association's 
Police  Officer  J:A-1  (M).  The  IPMA  Form  A-l  (M)  test  contains  five  sub-tests: 

Sub-Test  Subject  No.  of  Questions 

I  Verbal  Ability  25 

II  Number  Series  5 

III  Table  Interpretation  6 

IV  Interpretation  of 
Hypothetical  Rules 

and  Regulations  15 

V  Reading  Comprehension  20 

Total*  71 

The  California  Selection  Consulting  Center  performed  a  concurrent  validation 
study  with  the  IPMA  J:A1  as  the  test  instrument  and  job  performance  evaluation  as 
the  criterion.   This  study  found  that  "...  the  IPMA  J:A1  examination  (is)  signi- 
ficantly correlated  with  'major,  critical  and  important'  elements  of  job  success 
for  entry-level  law  enforcement  positions"  (Selection  Consulting  Center,  1973* 
P.   53). 

The  results  of  a  job  analysis  performed  by  Bio-Dynamics  Inc.,  Research 
Division  in  January  of  1972  on  the  Police  Officer  positions  in  Massachusetts 
identified  the  elements  of  "reading  comprehension,  powers  of  observation,  and 
analytical  ability"  as  items  which  "...  should  be  included  as  test  items  in 
the  written  examination."    (Bio-Dynamics,  1972,  p.  10). 


In  part  as  a  result  of  these  similarities,  sufficient  documentation  was  thought 
available  to  justify  applying  the  results  of  the  California/Nevada  Study  to  the 
Police  Officer  function  in  Massachusetts  in  accordance  with  E.E.O.C.  Guideline 
1607.7.  This  validity  generalization  served  as  the  basis  of  test  selection. 
Precise  estimation  of  the  validity  of  the  IPMA  Form  Al  as  a  predictive  job- 
related  instrument  for  Massachusetts'  Police  Officer  position  awaits  the  results 
of  the  present  criterion-related  predictive  validity  study. 

The  actual  administration  of  the  1975  Entry  Level  Police  Officer  Examination 
was  conducted  in  the  classrooms  of  selected  public  schools  in  the  following  cities: 
Boston,  New  Bedford,  Lowell,  Lawrence,  Worcester,  Springfield,  and  Pittsfield. 
Other  cities  and  towns  were  added  as  required  by  the  distribution  of  the  applicant 
group  throughout  the  state.  Each  of  the  classrooms  had  a  monitor  who  was  responsible 
for  giving  the  directions  for  taking  the  exam,  assuring  the  integrity  of  the  testees' 
performance  and  collecting  additional  data  from  the  testees,  such  as  fingerprints, 
application  forms  and  proof  of  birth  date.  The  English  language  version  of  the 
exam  had  two  parts.  Part  I,  1.5  hours  in  length,  Part  2,  45  minutes  in  length. 
Applicants  were  not  allowed  to  leave  the  classroom  for  any  reason  during  the 
entire  administration.  Part  I  consisted  of  the  IPMA  Police  Officer  J:A1;  Part  2 
was  a  test  given  for  research  purposes  only  and  later  discarded.  Each  classroom 
monitor  was  provided  with  instructions  for  allowing  applicants  to  take  the  examination 
in  Spanish  if  they  so  desired.  Thse  applicants  taking  the  Spanish  exam,  in  addition 
to  the  1.5  hour  and  the  45  minute  sections  of  the  Entry  Level  Police  Officer  Exam- 
ination, were  required  to  take  a  Reading  Test  of  40  minutes  in  length  which  measured 
their  reading  proficiency  in  English.  Only  a  handful  of  applicants  chose  to  take 
the  exam  in  Spanish.  Data  from  these  applicants  is  not  considered  below. 

The  Division  of  Personnel  Administration's  Examination  Bureau  scored  the 
October,  1975  administration  of  the  IPMA  J:A1  by  using  an  optical  scan  answer 
sheet  reader  with  a  mark  sense  answer  sheet.  The  Administration  and  Finance  Computing 
Center  generated  the  frequency  of  occurrence  of  the  percentaged  scores.  In  accordance 
with  state  law  the  "passing  score",  the  score  that  an  applicant  would  have  to 
receive  in  order  to  be  included  on  the  eligible  list,  was  established  at  6l$.  The 
examination  scores  of  the  entire  applicant  group  were  then  separated  into  categories: 
males,  female,  Black,  White,  and  Spanish-Surnamed,  and  for  each  of  these  groups  the 
frequency,  score,  mean,  standard  deviation,  and  variance  were  compiled (Division  of 
Personnel  Administration  press  release  issued  on  July  9»  1976. ) 

The  original  scoring  generated  the  examination  data  in  a  form  which  prevented 
calculation  of  sub-scores.  The  present  study  rescored  the  original  examination 
answer  sheets,  in  order  to  obtain  each  applicant's  total  score  and  scores  on  the 
five  sub-tests.  The  descriptive  statistics  of  interest  include  the  examination 
raw  scores  for  each  applicant,  the  mean  and  standard  deviation  for  the  entire  popu- 
lation, and  for  subsaraples  of  Black,  White,  Spanish-Surnamed,  male  and  female 
applicants. 

Descriptive  statistics  for  each  of  these  subsamples  will  be  compared  to  the 
population  parameters,  and  similarities  and  differences  will  be  noted.  In  this 
fashion  the  general  performance  of  identifiable  minority  groups  will  be  derived 
and  allow  intergroup  comparisons. 


The  Kuder-Richardson  method  of  determining  inter-item  consistency  will  be 
used  to  index  the  relability  of  the  IPMA  J:A1.   This  method  will  be  employed  on 
the  test  as  a  whole  and  upon  each  of  the  sub-tests  where  it  would  yield  adequate 
results. 


2.2        Criterion 


The  IPMA  J:A1  was  used  by  the  Division  of  Personnel  Administration  in  October 
of  1975  as  a  means  of  determining  aptitude  for  the  entry  level  police  position. 
Implicit  in  the  test's  use  was  the  concept  that  a  certain  standard  of  performance  on 
the  test  would  indicate  whether  an  individual  would  make  a  qualified  police  officer. 
Based  on  validity  and  validity  generalization  evidence,  it  was  thought  that  adequate 
performance  on  the  test  demonstrated  the  qualities  required  to  become  a  good  police 
officer.  Demonstrating  the  worth  of  this  assumption  entails  evaluating  the 
validity  of  the  test.   For  the  test  to  have  high  validity,  test  performance  must 
have  a  positive  correlation  with  a  measure  of  success  as  a  policeman.   Job  per- 
formance (i.e.  a  successful  police  career)  is  the  ultimate  measure,  for  only  when 
a  person  has  shown  capability  on  the  actual  job  is  ultimate  success  demonstrated. 

Another  acceptable  method  to  use  to  validate   a  test  such  as  the  IPMA  J:A-1, 
is  a  comparison  between  test  performance  and  training  records.    X     classic  example 
of  this  type  of  criterion  validation  is  the  Air  Force  pilot  selection  test's  com- 
parison against  basic  flight  training  performance  (Anatasi,  1976). 

Each  of  these  two  methods  has  advantages  and  disadvantages.  Job  performance 
has  a  long  maturation  period,  especially  in  Massachusetts.  Two  years  might  pass 
from  the  date  of  examination  to  date  of  appointment  to  a  police  force.  Several 
more  years  might  pass  prior  to  consideration  for  any  type  of  promotion.  Rater 
bias  also  seems  to  be  a  disadvantage  of  job  performance  ratings  since  they  require 
judgement  of  supervisors  who  might  be  subject  to  influences  other  than  purely  job 
performance.  Even  with  the  use  of  trained  raters  the  potential  problem  remains. 
The  greatest  advantage  of  job  performance  is  that  it  need  not  be  compared  agains* 
another  criterion,  since  it  is  the  desired  result. 

Advantages  to  using  training  as  the  criterion  include:   a  shorter  maturation 
time,  an  objective  and  coherent  data  set,  especially  if  the  curriculum  is  standard- 
ized.  If  the  curriculum  is  standardized,  final  grades  are  then  a  result  of 
demonstrated  performance  in  defined  areas.   Further,  in  Massachusetts,  those  people 
receiving  permanent  appointments  from  cities  and  towns  subscribing  to  the  state 
Civil  Service  system  are  required  by  law  to  attend  one  of  the  certified  police 
training  academies  operating  within  the  state.  Academy  training  provides  an  element 
of  commonality  for  those  appointed  to  the  various  police  forces  and  reinforces  its 
use  as  a  criterion  in  a  predictive  study.  A  drawback  of  training  as  a  criterion  is 
that  it  is  one  step  removed  from  job  performance. 

Beyond  this  strong  general  support  for  the  use  of  training  data  as  a  criterion, 
there  is  an  unpublished  study  (Gannon,  1978)  conducted  in  Massachusetts  which  shows 
that  level  of  performance  in  the  training  academy  programs  is  related  to  level  of 
performance  on  the  job.   For  these  reasons,  the  decision  was  made  to  use  the  more 
coherent  training  data  as  the  criterion  in  this  study  rather  than  job  performance. 


The  State  Police  and  the  Metropolitan  District  Commission  Academies,  as  well 
as  academies  in  nine  cities  and  towns,  whose  Police  Departments  operated  training 
courses  within  the  months  following  establishment  of  the  1975  eligible  list,  were 
contacted  and  their  cooperation  with  this  study  was  solicited  and  received. 

There  are  two  areas  where  "contamination"  has  traditionally  been  considered 
possible  in  criterion  development:  tenure  and  prior  knowledge  of  the  predictor 
results  by  those  involved  in  criterion  ratings.  Some  assurance  must  be  had  that 
those  involved  in  evaluating  criterion  performance,  in  this  case  police  academy 
training  instructors,  do  not  know  the  scores  which  individuals  in  their  class 
received  on  the  entrance  examination.  Such  knowledged  tends  to  influence  the  grading 
of  criterion  performance  (Anatasi,  1976).  The  question  of  contamination  by  a  prior 
knowledge  of  predictor  results  was  investigated.  The  Massachusetts  Criminal  Justice 
Training  Council,  responsible  for  the  coordination  of  training  in  the  state,  provided 
information  on  the  type  of  data  with  which  newly  appointed  candidates  arrive  at  the 
academies.  This  data  does  not  contain  the  grade  received  by  the  candidate  on  the 
examination.  It  seems  that  this  aspect  of  criterion  contamination  is  not  a  consider- 
ation for  further  pursuit. 

Tenure,  as  it  is  meant  here,  refers  to  the  ai.iount  of  experience  a  candidate 
would  accumulate  on  the  job  before  attending  an  academy.  Since  not  all  candidates 
were  academy  trained  immediately  after  appointment,  those  having  on-the-job  experience 
might  be  in  a  position  of  unfair  advantage  in  academy  training  and  affect  the  resulting 
correlations.  This  matter  of  tenure  does  not  seem  to  factor  into  this  study.  Since 
it  is  a  matter  of  legal  necessity  that  a  new  appointee  must  attend  and  successfully 
complete  academy  training  within  a  year  of  being  appointed,  the  longest  time  this 
allows  for  on-the-job  training  is  nine  mcnths.  Other  validation  studies  consider- 
ing the  topic  have  not  considered  on-the-job  training  a  significant  influence  unless 
it  was  a  period  greater  than  three  years.  It  would  have  been  relatively  easy  to 
examine  these  assumptions  during  the  course  of  analysis,  however,  due  to  the  Pair 
Information  Practices  Act  and  the  constraint  cf  anonymity  placed  upon  information 
gathering  by  the  Massachusetts  Criminal  Justice  Training  Council's  interpretation  of 
this  Act,  this  type  of  information  was  lost  in  the  coding  procedure. 

2.3   Validation  Procedure 

3.1   Correlation  Coefficient 

The  basic  statistic  used  to  measure  validity  is  the  Pearson  productr-moment 
correlation  coefficient.  Unless  otherwise  stated  correlation  will  mean  Pearson 
correlation.  This  statistic  has  found  widespread  use  in  many  different  fields. 
The  correlation  coefficient  is  an  index  of  the  degree  of  linear  association  between 
two  variables,  in  this  case  the  examination  score,  or  subscore,  and  the  police 
academy  grade.  The  correlation  can  range  between  +1.00  and  -1.00.  The  sign  of 
the  coefficient  indicates  whether,  when  the  first  variable  increases,  the  second 
variable  increases  (positive  correlation)  or  decreases  (negative  correlation). 
The  greater  the  magnitude  of  the  coefficient,  i.e.   the  closer  to  plus  or  minus 
1.00,  the  greater  the  degree  of  association.  Thus  correlations  of  -0.90  or  +0.90 
indicate  a  very  high  degree  of  association,  the  former  in  a  negative  direction  and 
the  latter  in  a  positive  direction.  Correlations  of  +0.05  or  -0.05  indicate  little 
or  no  association,  since  they  are  both  close  to  0.00. 


In  addition  to  the  direction  and  magnitude  of  the  Pearson  correlation  co- 
efficient it  is  important  to  consider  the  shapes  that  a  scatterplot  of  the  data 
assumes.   Examination  of  scatterplot  diagrams  is  especially  helpful  when  correlations 
art  found  to  be  near  zero.   Distributions  may  have  correlations  at  or  near  zero  and 
yet  have  strong  curvilinear  relationships.   A  fast,  effective  method  of  distinguishing 
a  non-linear  relationship  from  no  relationship,  both  of  which  may  exhibit  correlations 
at  or  near  zero,  is  by  examination  of  the  distribution's  scatterplot  diagram.   If 
this  shows  that  no  apparent  systematic  deviation  exists,  then  it  is  safe  to  assume  that 
there  is  no  relationship  between  the  two  variables.   On  the  other  hand  relationships 
between  two  variables  may  have  Pearson  correlations  very  near  zero  and  exhibit  strong 
relationships,  in  a  non-linear  fashion  (perhaps  due  to  a  "ceiling"  effect,  with  all 
people  scoring  above  a  certain  test  score  performing  at  the  maximum  level  in  the  academy). 

2.3.2  Range  Restriction 

The  problem  of  restriction  of  range  besets  any  criterion-related  measure  of 
validity  whenever  personnel  are  actually  selected  on  the  basis  of  the  test  used  as  the 
predictor.   In  the  present  case,  police  officers  who  go  into  training  at  the  academies 
represent  only  an  upper  segment  of  the  population  which  was  actually  tested,  since 
only  those  above  the  cutoff  score  can  be  selected.   A  direct  measure  of   the  validity 
of  the  test  can  only  be  made  for  this  upper  segment  of  restricted  range  in  the  test 
scores.  However,  the  validity  of  the  list  for  the  total  population  of  those  who  took 
the  test  is  of  greater  interest,  for  it  is  the  validity  of  the  whole  test  which  in- 
dicates the  predictive  power  of  the  test.   This  restriction  almost  always  results  in 
a  reduction  of  the  magnitude  of  the  correlation  coefficient.  But,  if  the  variance  in 
the  test  scores  for  the  total  population  is  known  as  well  as  the  variance  of  test 
scores  in  the  restricted  group,  an  estimate  of  the  validity  for  the  population  can  be 
made.  This  process  of  correcting  for  restriction  in  range  was  carried  out  in  the 
present  study,  to  yield  "corrected"  or  "adjusted"  correlations. 

The  correction  for  restriction  of  range  can  be  applied  to  the  correlation  co- 
efficients obtained  for  each  academy  to  more  accurately  estimate  the  validity  of  the 
exam  in  predicting  academy  performance  over  the  whole  population  of  test  takers.   Since 
the  variance  of  scores  on  the  entry  level  police  examination  is  different 

for  each  academy,  this  correction  is  applied  to  the  individual  correlations  for  each 
academy.   The  correction  for  restriction  of  range  does  rely  upon  an  assumption  that  a 
linear  relationship  exists  between  exam  and  academy  scores  over  the  whole  range  of 
scores,  i.e.  the  relationship  which  exists  in  the  upper  segment  continues  into  the 
lower  segment  of  the  population  as  well. 

2.3.3  Averaging  Correlation  Coefficients 

Several  academies  were  contacted  and  general  information  on  curricula  and 
scoring  procedures  was  gathered.   It  was  found  that  sufficient  differences  existed 
along  several  variables  (namely,  subject  matter,  length  of  the  course,  and  subjects 
taught)  so  that  treating  the  sample  as  one  unit  would  not  provide  an  accurate  measure 
of  validity.   It  was  decided  therefore  to  perform  the  analysis  by  subgrouping  the 
sample  according  to  academy  attended.  The  simplest  way  of  arriving  at  one  validity 
coefficient  from  the  separate  academies  would  be  to  find  the  average  of  the  correlation 
coefficients.  However,  this  posed  a  statistical  problem  which  was  dealt  with  as 
follows:  A  correlation  coefficient  is  only  an  index,  it  does  not  come  from  a  scale 
with  equal  units.  Differences  among  higher  coefficients  reflect  larger  differences  in 


association  than  differences  among  small  coefficients.  Thus,  the  difference  between 
0.90  and  0.80  is  much  greater  than  the  difference  between  0.20  and  0.10.  If  the 
various  coefficients  are  relatively  close  to  one  another  in  magnitude  it  would  be 
appropriate  to  simply  average  them.  However,  if  there  are  large  differences,  some 
sort  of  transformation  to  a  scale  with  equal  units  may  be  made.  The  best  trans- 
formation is  to  use  Fisher's  Z  coefficient  (Guilford  &  Pruchter  1975>  p.  319).  In 
the  process  described  by  Guilford  and  Fruchter,  the  correlation  coefficients  are 
converted  to  Fisher's  Z  coefficients,  and  weighted  by  N-3  ("the  degrees  of  freedom 
of  a  Fisher's  Z  coefficient)  for  each  academy  when  averaged.  The  obtained  average 
Fisher's  Z  coefficient  is  converted  back  to  a  correlation  coefficient.  The  sum  of 
the  degrees  of  freedom  are  used  to  evaluate  the  significance  of  the  result. 

3.4   Utility 

An  assessment  of  the  practical  significance  the  utility  of  the  test  was  made. 
This  is  based  upon:   (l)  the  magnitude  of  the  validity  coefficient;   (2)  the  selection 
ratio,  the  number  of  job  openings  to  the  number  of  applicants;  and  (3)  the  success 
ratio,  the  number  of  hires  considered  successful.  The  Taylor-Russell  tables  give  the 
percentage  of  hires  considered  successful  using  the  selection  system  when  the  previous 
three  variables  are  considered. 

Thus,  an  assessment  of  the  validity  of  the  test,  and  the  sub-tests,  can  be  made 
by  obtaining  a  correlation  of  the  academy  scores  with  exam  scores  and 
sub-scores  for  each  academy,  correcting  these  coefficients  for  range  restriction, 
and  obtaining  a  weighted  average  through  a  Fisher's  Z  transformation.  An  assessment 
of  practical  utility  can  be  made  by  using  these  validity  coefficients,  and  obtaining 
selection  ratios  and  success  ratios  in  conjunction  with  use  of  the  Taylor-Russell 
tables. 

RESULTS 

3.1   Subjects 

There  was  a  total  of  376  subjects  (N=376),  338  males  and  38  females.  This 
included  a  total  of  175  veterans  and  201  non- veterans .  The  ethnic  composition  was: 
314  non-minorities,  52  Blacks,  and  10  Spanish-Surnamed.  The  number  of  appointees  per 
academy  ranged  from  14  to  71 »  and  the  number  of  minorities  per  academy  ranged  from 
0  to  15.   (The  sample  of  this  study  is  further  described  by  academy   in 
Table  Al. )  According  to  self-reported  data  on  3^1  subjects,  304  have  a  high  school 
diploma  and  29  have  Educational  Equivalency  Certificates. 


*  Appendices  contain  more  detailed  tables  of  descriptive 

statistics  and  correlations.  These  Tables  are  prefixed,  such 
as  "A"  (e.g.   Table  Al).   Tables  and  Figures  numbered 
without  prefix  are  found  in  the  text. 


Means  and  standard  deviations  of  exam  score  for  ethnic  and  sex  breakdowns  are  given 
below  in  Table  1. 


Table  1:  Number  and  Mean  Test  Scores  of  Applicants  by  Ethnic 
Group  and  Sex 


Group 


#  In  Pop. 


%   Of  Poo. 


Mean  Raw  Score-* 


S.D. 


Native  American 

182 

Blacks 

1,398 

Spanish  Surnamed 

481 

Orientals 

85 

Caucasian 

16,873 

Other 

700 

Male 

16,695 

Female 

3,148 

.71 

42.37 

11.58 

7.02 

38.81 

10.50 

2.42 

31.60 

13.80 

.43 

44.40 

10.41 

84.76 

48.96 

8.56 

3.516 

46.51 

10.28 

83.87 

47.82 

9.62 

15.80 

47-15 

9.80 

Raw  Score  refers  to  the  number  of  items  answered  correctly. 


3.2   Predictor   (IPMA  EXAM) 

The  original  applicant  population  which  was  scored  in  1975,  had  an  N  of  22,398, 
a  mean  percentage  score  of  68.6  with  a  standard  deviation  of  13,663.   In  the  present 
rescoring  20,467  cases  were  verified  by  optical  scanner  as  completed;  this  amounted 
to  some  90. 3#  of  the  original  applicant  group.  The  remainder  of  the  records  were 
deemed  incomplete  due  to  clerical  or  computer -related  loss  of  exam  or  identification 
data.  Analysis  of  the  confidence  interval  of  the  difference  between  the  means  of 
this  scoring  and  the  original  scoring  is  summarized  in  Table  A2.  The  percentage  mean 
of  the  present  scoring  (N=20,467)  is  66.67  with  a  standard  deviation  of  14.  138. 
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The  mean  raw  score  of  the  validation  sample  on  the  entry  level  police  examination 
was  57.186  from  a  possible  total  score  of  71 »  the  standard  deviation  was  5.47?  this 
corresponds  to  a  percentage  mean  score  of  80.54  and  a  standard  deviation  cf  7.70. 
Mean  exam  scores  and  academy  grades  are  given  in  Table  A3.  Descriptive  statistics  for 
the  present  scoring  by  subtests  may  be  found  in  Table  A4. 

Frequency  distributions  of  the  demographic  data,  the  frequency  of  response  to 
the  self -report  inventory  portion  of  the  examination  by  score,  are  given  in 
Table  A5. 

The  reliability  coefficient,  determined  by  Kuder-Richardson's  formula  20  is 
r=0.954.  This  coefficient  is  similar  to  an  earlier  estimate  arrived  at  by  Kuder- 
Richardson's  formula  21  of  r=0.9454.  The  Dale-Chall  readability  index  was  derived 
as  7.25,  or  9-10th  grade  level,  for  this  IPMA  J:A1  by  an  earlier  study  (California 
Selection  Consulting  Center,  1973). 


3.3   Validity 


The  relationship  between  the  examination  score,  and  the  academy  grades  for 
those  in  the  sample  was  found  to  be  strongly  positive.  The  corrected  Pearson 
coefficient  of  correlation  was  r=0.719  (d.f.  =  354,  p     <    .001).   A.  more 
detailed  description  of  the  findings  follows. 

The  basic  data  on  sub-scores  and  raw  scores  are  given  in  Appendix  B,  Tables 
B1-B6.  These  include  the  N  for  each  academy,  the  mean  and  standard  deviation  for 
the  particular  score,  and  the  unadjusted  and  the  adjusted  correlation  coeff icier ts. 
Overall  descriptive  statistics  are  given  below  in  Table  2. 

Table  2.  Examination  mean  and  standard  deviation  for  the  total  population  and  the 
validation  sample  by  examination  sub-area. 


Sample 

Exam  ] 

Population* 

#  of 

Score 

N 

Mean 

s.d. 

quest. 

Mean 

s.d. 

Sub-score  1 

3^3 

19.455 

2.614 

25 

15.436 

3.649 

Sub-score  2 

343 

4.324 

.818 

5 

3.63 

1.262 

Sub-score  3 

343 

5.603 

.799 

6 

5.16 

1.415 

Sub-score  4 

343 

11.863 

1.727 

15 

9.77 

2.728 

Sub-score  5 

343 

16.079 

2.401 

20 

13.34 

3.641 

Raw  Score 

376** 

57.186 

5.470 

71 

47.34 

10.038 

*  (N=20,467) 

**  Some  subscores  were  deleted  due  to  omissions  in  the  data  collection  and 
tabulation.  Therefore  the  total  N  is  greater  than  the  N  for  the  subtests. 
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The  corrected  correlation  coefficients  for  raw  score  and  subscores  with  academy 
grades  are  given  in  Table  A6.   The  range  of  the  correlation  coefficients  for  overall 
raw  score  with  academy  grades  varies  from  -0.017  to  0.713.   The  weighted  average 
correlation  co-efficient  of  the  raw  score  with  academy  grade,  without  conversion  to 
Fisher's  Z  scores  (a  conservative  ''average") ,  is  0.491   '/i.f.=354,  p  <  .001). 

Table  A7  gives  the  correlation  coefficients  by  academy  of  raw  score  and  sub- 
scores  with  academy  grades,  after  adjustment  for  restriction  of  range.   These  are 
then  converted  to  Fisher's  Z  scores,  weighted  (by  N-3)  and  an  average  Z  score  is 
obtained  which  is  reconverted  to  an  r.  These  overall  average  adjusted  correlation 
coefficients  are  given  in  Table  3  below,  along  with  the  weighted  (without  conversion 
to  Fisher's  Z)  average  r  for  the  unadjusted  correlation.   Throughout  the  study, 
average  r's  are  weighted  by  the  degrees  of  freedom  of  the  subsample  (i.e.  the 
academy).  All  unadjusted  average  correlation  coefficients  are  computed  without 
conversion  to  Fisher's  Z,  and  all  average  adjusted  r's  are  computed  using  conversion 
to  Fisher's  Z  coefficients,  averaging  these,  then  converting  back  to  an  average  r. 

There  were  not  enough  minorities  in  any  one  academy  to  allow  statistical  tests 
of  significance  to  have  any  substantial  meaning.   The  highest  number  of  minorities 
was  at  the  Boston  Academy  where  there  were  15.  However,  to  give  an  idea  of  possible 
directions  of  difference  between  minorities  and  non-minorities,  statistics  are  given 
in  Tables  A8  and  A9  for  minority  subsamples  for  those  academies  with  a  minority 
n  ~j.        10.  Table  A8  contains  means  and  standard  deviations  for  academy  grade  and  exam 
score,  and  Table  A9  contains  validity  data  for  minorities  versus  non-minorities. 

Table  3 

Overall  Average  Unadjusted  and 

Adjusted  Validity  Coefficients  for  Total  Score 

and  Sub-Scores 


Unadjusted 

Adjusted 

Sub-score 

r 

d.f. 

P 

r 

d.f. 

P 

1 

.367 

321 

.001 

.486 

310 

.001 

2 

.359 

321 

.001 

.526 

310 

.001 

3 

.126 

321 

.05 

.269 

310 

.001 

4 

.286 

321 

.001 

.445 

310 

.001 

5 

.333 

321 

.001 

.511 

310 

.001 

Total  Score 

.491 

354 

.001 

.719 

343 

.001 
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By  determining  the  standard  error  of  the  mean  score,  one  can  determine  the 
accuracy  provided  by  a  given  academy  sample  size.  A  sample  size  of  30  results 
in  a  95$  confidence  of  being  within  3$  of  the  mean  of  the  population  as  a  whole. 
With  this  as  a  gauge  of  the  accuracy  of  a  sample  size  of  30,  an  average  correlation 
for  only  those  academies  of  n=30  is  given.  The  unadjusted  weighted  average  and 
the  weighted  adjusted  r  after  adjustment  for  restriction  of  range  for  raw  score  and 
and  sub-scores  are  given  below  in  Table  4. 

Although  the  highest  N  for  minorities  (Black  and  Spanish-Surnamed)  for  any 
academy  was  only  15,  a  check  was  made  on  the  four  (4)  academies  which  had  minority 
subsamples  of  10  or  greater.  These  results,  and  description  of  means  and  standard 
deviations  are  given  in  Table  A8  &  A9.  One  validity  coefficient ■ for  minorities  was 
found  significant,  that  of  Boston  (r=0.71,  d.f.=13,    p  <  ,002). 


Table  4 
Overall  Average  Unadjusted  and  Adjusted  Validity  Coefficients 

for  Those  Academies  With  n  >    30 


Unadjusted 


Adjusted 


Sub-score 

r 

d.f. 

rc 

d.f. 

Zfrc) 

1 

.426 

227 

.553 

222 

.624 

2 

.345 

227 

.492 

222 

.539 

3 

.195 

227 

.389 

222 

.399 

4 

.279 

227 

.417 

222 

.444 

5 

.328 

227 

.513 

222 

.568 

Total  Score 

.503 

251 

.754 

246 

.982 

Discussion 


4.1   Validity 


There  is  a  strong  relationship  between  a  score  on  the  1975  examination  and 
later  performance  in  academy  training.  The  unadjusted  correlation  coefficients 
which  index  this  relationship  are  both  statistically  significant  and  quite  high. 
The  overall  unadjusted  average  r  of  0.49  indicates  that  within  the  restricted 
sample  of  the  study,  as  much  as  25%   of  the  variance  in  academy  grades  is  explained 
by  the  scores  on  the  examination.  The  corrected  average  r  of  0.719  indicates  that 
for  the  group  of  applicants  taking  the  examination  50%   of  the  variance  in  obtained 
academy  grades  would  be  explained  by  examination  scores.  By  way  of  comparison, 
typical  r's  are  about  0.30.  By  any  standard  this  test  can  serve  as  a  useful  pre- 
dictive instrument. 
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4.2   Predictor  (IPMA  Exam) 

High  test  reliability  was  found  for  the  examination.   Although  the  examination 
has  very  high  validity,  there  may  be  other  test  areas  which  are  job  related  and, 
if   included  in  the  examination,  would  yield  even  greater  predictive  power.   In 
the  future  research  should  be  undertaken  to  explore  this  possibility. 

The  test  security  of  the  particular  items  used  on  the  present  examination 
can  no  longer  be  assured  since  over  20,000  persons  have  seen  the  questions.   It 
is  suggested  that  in  the  future  tests  be  employed  that  measure  the  same  areas 
(using  similar  questions).   The  extent  to  which  such  new  tests  are  measuring  the 
same  areas  as  the  present  test  may  be  assessed  by  empirical  means. 

4.j5   Statistical  Relability  and  Sub-Sample 

A  word  of  caution  must  be  interjected  about  the  interpretation  of  overall 
average  coefficients.   Since  these  are  dependent  upon  the  statistical  power  of 
the  subsamples  from  which  they  are  derived  they  suffer  from  being  partially 
based  on  subsamples  of  small  N,  although  this  is  considered  in  the  weighting  pro- 
cedure.  It  was  felt  that  deriving  a  figure  which  was  based  upon  only  those  samples 
of  in  n  ^  30  would  be  useful.  As  can  be  seen  from  Table  4,  for  the  five  academies 
with  n  .i   30,  the  validities  were  comparable  to,  even  slightly  higher  than,  the 
full  sample.  A  similar  pattern  of  correlations  for  the  sub-scores  as  for  the  full 
sample  is  also  present.   Thus,  by  making  the  constraints  for  the  computations  more 
stringent,  the  only  effect  is  to  increase  slightly  the  ooaerved  validity  coefficient, 

4.4  Differential  Prediction 

Ideally,  the  task  of  a  predictive  validity  study  is  to  determine  to  what 
degree  performance  on  the  predictor  reflects  performance  on  the  criterion.   Speaking 
strictly  within  the  context  of  validity,  there  seems  little  reason  to  identify  and 
consider  various  subgroups,  eithnic  or  otherwise,  unless  there  is  an  indication  that 
these  characteristics  provide  better  indicators  than  does  a  test  score. 

The  stratification  of  mean  scores  displays  differential  performance  of  the 
identifiable  ethnic  groups  with  respect  to  exam  scores.   This  study  is  not  designed 
to  explore  the  cause  of  these  results  and  care  should  be  exercised  in  their  inter- 
pretation.  The  small  percentage  of  minority  applicants  limited  the  number  who 
eventually  were  included  in  police  academy  training,  even  though  provisions  were 
made  by  the  courts  for  increased  minority  inclusion.   In  the  interpretation  of  the 
validity  coefficient  of  the  various  academies'  minority  members,  care  should  be 
taken  since  these  sample  sizes  did  not  exceed  fifteen.    The  results  found  in  these 
tables  of  minority  performance  should  in  general  be  taken  as  possibilities  only  and 
not  be  interpreted  as  substantiated  facts.   It  is  impossible  to  develop  any  general 
trend  from  these  data.  The  results  are  included  for  speculative  purposes. 

4.5  Training  and  Tenure 

Two  factors  within  the  overall  interpretation  warrant  additional  consideration. 
First,  the  relationship  between  job  performance  must  be  established. 
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Training  is  required  by  law  in  Massachusetts,  and  an  appointee  is  probationary 
until  s/he  meet  the  requirement.  It  is  therefore  a  substantial  and  common  facet 
of  the  police  officer  function  throughout  the  state.  One  interesting  fact  did 
emerge  in  the  analysis  of  the  Boston  Academy.  With  the  highest  N  and  the  strongest 
positive  correlation,  Boston's  curriculum  is  unique  in  that  it  alternates  4  weeks 
of  training  with  4  weeks  on  the  job  for  the  24  week  course.  A  further  requirement 
minimized  the  potentially  confounding  effects  of  the  second  factor,  tenure. 
Appointees  must  complete  training  within  one  year  of  date  appointed  which  gives  an 
upper  limit  of  nine  months  of  tenure,  since  the  minimum  length  of  training  is  12 
weeks.  The  literature  on  the  subject  of  tenure  does  not  consider  this  a  long 
enough  time  to  affect  performance  levels;  it  was  therefore  ruled  out  from  consider- 
ation (Anastasi,  1976). 

Since  the  Boston  Academy  grade  is  partially  dependent  upon  actual  police  work, 
the  fact  that  this  academy  had  the  highest  correlation  supports  inferences  of  the 
exam's  ability  to  predict  on-the-job  performance. 


4.6   Utility 


It  is  good  practice  to  assess  the  practical  significance,  the  utility  of  the 
examination.   This  is  influenced  by:   (l)  the  magnitude  of  the  validity  coefficient, 
(2)  the  selection  ratio  of  the  number  of  job  openings  to  the  number  of  applicants, 
and  (3)  the  success  ratio,  the  percentage  of  hires  consider  successful. 

The  selection  ratio,  determined  by  dividing  the  number  of  hires  from  the 
examination  by  the  number  who  took  the  examination,  was  found  to  be  0.05.   A  set 
of  tables  developed  by  Taylor  &  Russell  permits  a  numerical  estimate  to  be  made  of 
the  gain  in  proportion  of  successful  applicants  due  to  the  examination.   In  order 
to  determine  this,  one  must  enter  the  validity  coefficient,  the  selection  ratio, 
and  the  suceess  ratio. 

Tne   criterion  for  success  as  a  police  officer  is  not  clear  cut;  there  is  no 
standard  method  for  arriving  at  success.   One  possibility  would  be  to  consider 
as  successful  all  candidates  who  would  be  expected  to  finish  and  pass  the  training, 
considering  as  unsuccessful  all  those  who  dropped  out  or  failed.  Another  method 
would  be  to  use  a  cutoff  on  the  criterion,  i.e.  the  academy  grade.   Perhaps  a  median 
split  on  all  academies  would  suffice;  or  considering  as  successful  those  who  obtained 
a  score  greater  than  one  standard  deviation  below  the  mean  would  also  suffice.  But 
all  of  these  methods  are  rather  arbitrary. 

Since  there  are  multiple  methods  of  arriving  at  a  criterion  of  success  it  was 
decided  to  show  graphically  what  increases  in  percentage  successful  result  from  a 
range  of  success  ratios  for  the  obtained  r  =  .50  and  rc  -„ 70  (rounded),  and  selection 
ratio  of  0.05   Figure  1  shows  the  relationship  with  the  straight  line  representing 
the  proportion  successful  without  the  examination  and  the  curves  representing  the 
proportion  successful  using  the  examination.  The  difference  between  the  two  shows 
the  increase  in  accuracy  of  selection  of  successful  candidates  when  using  the 
examination  for  selection.  As  can  be  seen,  any  success  ratio  short  of  1.00  results 
in  an  increase  in  accuracy,  and  the  lower  the  initial  success  ratio  the  greater 
the  increase. 
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In  addition  to  a  consideration  of  the  foregoing  factors,  consideration  should 
also  be  made  of  the  human  and  economic  risk  of  hiring  an  unqualified  employee.   In  re- 
gard to  the  job  of  Police  Officer  the  risks  to  society  of  hiring  an  unqualified. 
Policeman  are  quite  high,  and  a  lower  degree  of  practical  significance  is  justified. 
Thus,  even  if  the  success  ratio  is  extremely  high,  an  increase  in  proportion  success- 
ful would  justify  the  use  of  the  test. 

Prom  Figure  1,  we  can  see  that  if  50$  of  the  applicants  can  do  the  job,  then 
the  failure  rate  without  using  the  examination  will  be  about  50$ ,  while  the  failure 
rate  using  the  examination  will  be  about  2%,     This  is  a  very  dramatic  decrease  in 
the  failure  rate  of  new  police  officers.  The  failure  rate  is  decreased  by  a  fraction 
of  25  by  use  of  the  examination. 

Conclusion 

The  examination  administered  in  1975  has  been  shown  to  be  a  valid  test.  The 
test  development  process  linked  the  content  of  the  test  with  the  content  of  the  job. 
Criterion-related  validity  seen  in  other  jurisdictions  was  also  clearly  seen  in 
Massachusetts.  The  examination  is  reliable  and  has  high  validity  and  high  utility. 
The  use  of  the  examination  is  strongly  supported. 
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APPENDIX    A 


Table  A1 .   "■■",  Veteran  and 

Validation  Sample,  bv  Ac  i 
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TABLE  Al 


MALE 

FEMALE 

VETERAN 

DISABLED 
VETERAN 

NON- 
VETERAN 

NON- 
MINORITIES 

BLACK 

SPANISH- 
SURNAMED 

OVERALL 
N 

Brookline 

17 

0 

2 

2 

13 

17 

0 

0 

17 

Barnstable 

12 

2 

6 

0 

8 

11 

3 

0 

14 

Waltham 

14 

2 

2 

5 

9 

15 

1 

0 

16 

Springfield 

44 

5 

9 

3 

37 

38 

10 

1 

49 

S.E.  Mass 

58 

6 

17 

3 

42 

63 

1 

0 

64 

Medford 

20 

0 

9 

4 

7 

16 

3 

1 

20 

M.D.C. 

28 

2 

8 

18 

4 

18 

11 

1 

30 

Boston 

59 

12 

17 

24 

30 

56 

13 

2 

71 

New  Bedford 

18 

4 

2 

10 

10 

12 

8 

2 

22 

State 

46 

1 

16 

4 

27 

45 

1 

1 

47 

Worcester 

22 

4 

9 

3 

14 

23 

1 

2 

26 

Totals 

338 

38 

99 

76 

201 

314 

52 

10 

J 


'■.•■.■'.u-^r-i-'J.':':* 


I 
??r:»?:::;i:;i:;:j:;';:: 


SAMPLE!  HE  AN 
SAMPLE  VARIANCE 
SAMPLE  STD  DEO,.  AT  I  ON 
SAMPLE  SIZE 
POP  Ui.  AT  I  ON  SIZE 
ESTIM  POF'N  STD  .;.T.'v 
STD  ERROR  ui-  Mi.- AN 

DIFF  BETWEEN  MEANS 

STD  ERROR  OF  D'.FF 

D E G R  OF'  PRE E D 0 M  (  D I F F  ) 


bMi-li"  ui::   i 

6  0  ,  o 
186*6/8 

13,663 
22398 

;)  ;>  ?  t.,  >. 

1 3  ,  £  6  3 


1. 4  ,  1  3  8 

:".  0  4  c;  / 
•>  ■) •;•  o o 


1,93 


»■:  ♦  T  U  .i.  /  "'4  i 

2  0  4  -i  (ft 


CONFIDENCE  LIMITS  ON  DIFFERENCE  BETWEEN  MEAN 
CONE  LEVEL      LOWER  LIM       J 


• !:;  i  T  y- 


.;£%. 

.'«.;i:ri:**;i:;ji 

M^m^r?$mx.'<ii»amil 

99, 
99  »  9 
99,9  9 
99 ♦ 999 


.!  >  9  ,  04  5 

;i.  <■  b'  a  2  2  /' 
1  ..87312 
1 ,83523 
I.  ,8343 
1  *  8  J.  7  0  8 
1 > 80 179 


i.  «■  v  4  ■•■  i  t  ■■' 

1*97773 
1 ♦ 96688 
2*004/3 
2,0255 

2,05621 


0,051    UNTS. 


7ll§t1 


-  :„-:r*7-~.* 


RON  COMPLETE. 


^Jf*  WW ,  tf  ^Tfl'L+1,-3l%.  '*WH 


a 


ilwiMiiifi'ftiVlftilfilgfal 


] 


•  uia.i.    -?.npi 


EXAM  SCORES  AND 


TABLE  A3 

ACADEMY  GRADES  BY  ACADEMY 


■ 

ACADEMY 

GRADE 

EXAM  RAW  SCORE 

f 

Grade  X 

S.D. 

Exam  X 

S.D. 

N 

r 

SE 

Brookline 

87.488 

5.1093 

56.47 

4.9512 

17 

1.20 

Barnstable 

91.6 

4.4054 

58.928 

4.5313 

14 

1.21 

Waltham 

86.565 

3.772 

57.875 

7.283 

16 

1.821 

Springfield 

87.196 

5.2407 

57.12 

6.6415 

49 

.949 

1 

S.E.  Mass 

87.2450 

5.4917 

56.20 

5.0275 

64 

1 
.628 

Medford 

90.8110 

3.3542 

56.90 

5.3104 

20 

1.187 

M.D.C. 

92.100 

2.264 

55.3667 

2.9182 

30 

.533 

Boston 

87.741 

4.27 

51.90 

4.8878 

71 

.560 

New  Bedford 

87.477 

4.0160 

54.4091 

3.5812 

22 

.763 

State 

91.3706 

4.23 

59.1915 

5.94 

47 

.366 

Worcester 
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5.1382 
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6.5661 

26 

1.29 
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TABLE  A  6 

UNADJUSTED  VALIDITY  COEFFICIENTS  FOR  RAW  SCORE 
AND  SUBSCORES  WITH  ACADEMY  GRADES 


1 

ACADEMY 

SUBSCORE  1 

SUBSCORE  2 

SUBSCORE  3 

SUBSCORE  4 

RAtt 

SUBSCORE  5   S*™E 

Brookline 

.1207 

.1405 

-.1895 

-.1242 

-.0249 

-.0173 

Barnstable 

-.2488 

*-* 
.7818 

-.0145 

.1192 

.4871 

.1673 

Waltham 

.5901* 

.5376 

.0960 

.5285* 

.5964* 

*■*■ 
.7360 

Springfield 

** 
.4349 

.5283 

** 
.5274 

.4031 

.3560* 

.5352 

S.E.  Mass 

** 
.4856 

.0649 

.1461 

.1541 

.0714 

.3717 

Medford 

.4231 

.3124 

-.0695 

.3306 

.2815 

.45*8* 

M.D.C. 

.3083 

.3013 

.0461 

.1245 

.2896 

.1573 

Boston 

.4581 

.4417 

.1946 

.4030 

.5612 

1HHI 

.7126 

New  Bedford 

-.1303 

.1561 

-.2667 

.4598* 

.2622 

.1531 

State 

*-* 
.3820 

.3678* 

.0225 

.2237 

*-•*-*■ 
.6141 

.5235 

Worcester 

.5082* 

.5874 

.2312 

.4045 

.5194* 

*-*■■* 

.6645 

*p<.05     **p<.01 


***p  <.001 


TABLE  A7 


ADJUSTED  VALIDITY  COEFFICIENTS  FOR  RAW  SCORE 
AND  SUBSCORE  WITH  ACADEMY  GRADES 




ACADEMY 

1 
SUBSCORE  1 

SUBSCORE  2 

SUBSCORE  3 

SUBSCORE  4 

SUBSCORE  5 

RAW 
SCORE 

Brookline 

.2090 

.2439 

-.2815 

-.2420 

-.0251 

-.0351 

Barnstable 

-.3646 

.8782 

-.0213 

.2159 

.7354 

.3891 

Waltham 

.6461 

*** 
.7859 

.1524 

.7495 

.5770* 

.8317 

Springfield 

.5104 

.6842 

■x-x-x- 
.8262 

-x-x-x- 
.5455 

** 
.4771 

■x-x-x- 
.6916 

S.E.  Mass 

.5886 

.0982 

.2501 

.2668 

.1290 

-X-X-X- 

.6245 

Medford 

. 5044* 

.4697* 

-.1083 

.4388 

.5298* 

-x-x-x- 
.6945 

M.D.C. 

-X--X- 

.4721 

.3749* 

.0983 

.2404 

-.4497* 

•x-x- 
.5485 

Boston 

.6206 

■x-x-x- 
.6359 

.3581 

.5631 

-x-x-x- 
.7792 

#•■*■* 
.9018 

New  Bedford 

-.2632 

.2465 

-.4348 

•x-x-x- 
.6951 

.4134 

.4628* 

State 

.5012 

-x-x-x- 
.4846 

.0313 

-X- 

.3115 

-X-X-X- 

.7936 

•x-x-x- 
.7203 

Worcester 

*-* 
.6348 

r 

-x-x-x- 
.7808 

•x- 
.4971 

.4624* 

.6660 

-X-X--X- 

.8058 

*p.<.05 


**p<.01     ***p<£.001 


TABLE  A  8 
DESCRIPTIVE  STATISTICS  FOR  MINORITIES  FOR  THOSE  ACADEMIES  WITH  N?10 


ACADEMY 

CRAPE 

EXAM. 

SCORE 

ACADEMY 

N 

MEAN 

S.D. 

MEAN 

S.D. 

SPRINGFIELD 

11 

81.1 

5.69 

49.18 

9.47 

M.D.C. 

12 

91.9 

1.88 

55.66 

2.74 

BOSTON 

15 

85*3 

3.49 

54.93 

3.99 

NEW  BEDFORD 

10 

86.9 

3.60 

55.80 

2.66 

OVERAL 

48 

85.54 

6.64 

53.32 

4.82 

TABLE  A  9 

UNADJUSTED  CORRELATION  COEFFICIENTS  FOR  MINORITIES  AND  NON-MINORITIES  FOR  THOSE 
ACADEMIES  WITH  N*10 


MINORITY 

NON 

-MINORITY 

N 

r 

P4 

N 

Y 

P* 

SPRINGFIELD 

11 

.4720 

.071 

38 

.1480 

.188 

M.D.C. 

12 

-.3056 

.167 

18 

.4080 

.046 

BOSTON 

15 

.7054 

.002 

56 

.6792 

.001 

NEW  BEDFORD 

10 

.1637  !  .326 

12 

.2895 

.181 

APPFNDIX  B 


Appendix  B 

This  section  contains  information  descriptive  statistics 
and  validity  coefficients  for  total  raw  score  and  broken  down  by 
subscore.    Within  each  table  the  following  information  is  ordered 
by  acadery: 

N:  Number  of  subjects  contributing  to  the 

statistics 
Exam  X:  Mean  IPMA  Form  A-l  exam  score  or  subscore 

Exam  s.d.:       Standard  deviation  of  exam  scores  or  subscores 
r:  Correlation  of  academy  grade  with  exam  score 

or  subscore  -  unadjusted,  i.e.  not 

corrected  for  range  or  restriction 
r   or  rcorr:      Correlation  of  academv  grade  with  exam  score 

or  subscore  -  adjusted  (i.e.  corrected 

for  range  restriction) 
Z  (rc)"  Fisher's  Z  conversion  of  rc. 

Totals  For  exam  mean  and  s.d.  this  refers  to 

computation  using  total  N. 
r:  Average  of  individual  r's  (each  weighted  by  N-2) 

Z  (rc)  :  Average  of  individual  Z  (rc's)  weighted  bv  \T-3. 

rcorr  or  rc:      Conversion  of  weighted  average  Z  (rc)  into  a 

correlation  coefficient. 


SUBSCORE  1 


TABLE  Bl 


EXAM.  POPULATION 

N    20467 

X"    15.436 
sd.X   3.649 


EXAM. 


N 

X 

i 
s .  d.x 

r 

■-  ■   > 

r  corr . 

Z(rc) 

BROOKLINE 

17 

18.941 

2.076 

.1207 

.2090 

.212 

BARNSTABLE 

13 

19.692 

2.394 

-.2488 

-.3646 

-.383 

WALTHAM 

16 

20.063 

3.151 

.5901* 

.6461** 

.768** 

SPRINGFIELD 

45 

19.222 

2.969 

.4349* 

.5104** 

.563** 

S.E.  MASS. 

51 

19.647 

2.784 

.4856** 

.5886** 

.677** 

MEDFORD 

19 

18.789 

2.917 

.4231 

.5044* 

.556* 

M.D.C. 

30 

18.767 

2.208 

.3083 

.4721** 

.513** 

BOSTON 

64 

19.813 

2.376 

.4581** 

.6206** 

.727** 

NEW  BEDFORD 

21 

17.905 

1.758 

-.1303 

-.2632 

-.269 

STATE 

47 

20.000 

2.604 

.3820** 

.5012** 

.550** 

WORCESTER 

20 

20.150 

2.621 

o5082* 

.6348** 

.750** 

TOTALS 

343 

19.455 

2.614 

*-* 
.3674 

df=321 

.486** 
df=310 

.531 

1 

j 

*r<T.u5 


SUBSCORE  2. 


TABLE  B2 


EXAM;POPULATION 
N   20467 


sd. 


3.63 
1.262 


EXAM. 


N 

X 

s.dx 

r 

r  corr. 

i(rc) 

BROOKLINE 

17 

4.412 

0712 

.1405 

.2439 

.249 

BARNSTABLE 

13 

4.077 

.862 

.7818** 

.8782** 

1.366 

'  WALTHAM 

16 

4.500 

.633 

.5376* 

.7859** 

1.060 

SPRINGFIELD 

45 

4.400 

.837 

.5283** 

.6842** 

.836 

S.E.  MASS. 

51 

4.216 

.832 

.0649 

.0982 

.098 

!"■■ 

!  MEDFORD 

19 

4.053 

.780 

.3124 

.4697* 

.510 

M.D.C. 

30 

4.167 

.986 

.3013 

.3749* 

.394 

ROSTON 

64 

4.453 

.754 

.4417** 

.6359** 

,7S? 

NEW  BEDFORD 

21 

4.286 

.784 

.1561 

.2465 

.252 

. STATE 

47 

4.404 

.901 

.3678* 

.4846** 

.530 

WORCESTER 

20 

4.300 

.733 

.5874** 

.7808** 

1.048 

... 
TOTALS 

343 

4.324 

.818 

!   .359** 
df=321 

.526** 
df=310 

.584   , 

EXAM 


TABLE  B3 


SUBSCORE  3 


Exam: 


N 

204-67 

X 
sd.x 

5.16 
1.415 

'   N 

X 

s.d. 

X 

r 

r  corr. 

*(  re) 

Brookline 

17 

5.353 

.931 

-.1895 

-.2815 

-.289 

Barnstable 

13 

5.615 

.961 

-.0145 

-.0213 

-.022 

Waltham 

16 

5.63 

.885 

.0960 

.1524 

.153 

Springfield 

45 

5.778 

.599 

*-* 
.5274 

*-* 
.8262 

1.178 

S.E.  Mass 

51 

5.529 

.809 

.1461 

.2501 

.255 

Medford 

19 

5.474 

.905 

-.0695 

-.1083 

-.109 

M.D.C. 

i 

30 

5.667 

.661 

.0461 

.0983 

.099 

i 
Boston 

64 

5.687 

.732 

.1946 

** 
.3581 

.375 

New  Bedford 

21 

5.429 

.811 

-.2667 

-.4348 

-.466 

State 

47 

5.553 

1.017 

.0225 

.0313 

.032 

Worcester 

20 

5.650 

.587 

.2312 

.4971* 

.545 

TOTAL 

343 

5.603 

.799 

* 
.126 

-** 
.269 

1 

! 

.276 

' 4 

df=321 


df=310 


*o  <  .  u5 


o  <  .Ul 


■*■# 


EXAM 


TABLE  B4 


SUBSCORE  4 


Exam: 


N 

20467 

X 

9.77 

sd.X 

2.728 

N 

X 

s.d. 

X 

r 

!  r  corr. 

*<ro 

1  Brookline 

17 

12.000 

1.369 

-.1242 

-.2420 

-.247 

Barnstable 

13 

12.769 

1.487 

.1192 

.2159 

.219 

Waltham 

16 

12.125 

1.500 

.5285* 

.7495 

.973 

Springfield 

45 

11.844 

1.846 

-*■-* 
.4031 

.5455 

.612 

S.E.  Mass 

51 

11.608 

1.537 

.1541 

.2668 

.273 

Medford 

19 

11.947 

1.957 

.3306 

.4388 

.470 

M.D.C. 

30 

11.567 

1.382 

.1245 

.2404 

.245 

Boston 

64 

11.859 

1.763 

.4030 

■X-* 

.5631 

.638 

New  Bedford 

21 

11.333 

1.461 

.4598* 

-X--X- 

.6951 

.858 

State 

47 

12.213 

1.910 

.2237 

* 

.3115 

.323 

Worcester 

20 

11.750 

2.314 

.4045 

.4624* 

.500 

TOTAL 

343 

11.863 

1.727 

■x-x- 
.286 

-X--X- 

.445 

.478 

*u  <.05 
**u  <  .01 


df=321 


df=310 


<   .01 


TABLE  B5 


SUBSCORE  5 


Exam: 

N 

X 
sd.X 


20467 
13.34 
3.641 


EXAM 


N 

X 

s.d. 

X 

r  • 

r  corr. 

arc) 

Brookline 

17 

15.765 

3.615 

-.0249 

-.0251 

-.0251 

Barnstable 

13 

17.000 

1.871 

.4871 

.7354 

.941 

Waltham 

16 

15.563 

3.829 

. 5964* 

.5770* 

.657 

Springfield 

45 

15.556 

2.555 

.3560* 

** 
.4771 

.519 

S.E.  Mass 

51 

15.902 

2.003 

.0714 

.1290 

.130 

Medford 

19 

16.579 

1.710 

.2815 

.5298* 

.590 

M.D.C. 

30 

15.200 

2.188 

-.2896 

-.4497* 

-.485 

Boston 

64 

16.312 

2.092 

-** 
.5812 

.7792 

1.043 

New  Bedford 

21 

15.381 

2.179 

.2622 

.4134 

.440 

State 

47 

17.021 

2.172 

.6141 

** 
.7936 

1.082 

Worcester 

20 

16.400 

2.479 

.5194* 

** 
.6660 

.804 

rOTAL 

343 

16.079 

2.401 

=  .333 

.511 

.5639 

df=321 


df=310 


df=310 


*p  <  .05 

*p  <• .  u  i 


EXAM 


TABLE  B6 
RAW  SCORE 


Exam: 
N 

20467 

X 

47.34 

Sd.X 

10.036 

< 

N 

X 

s.d. 

X 

r 

■  r  corr. 

Wq) 

Brookline 

17 

56.471 

4.951 

-.0173 

".0351 

-.035 

Barnstable 

14 

58.929 

4.531 

.1873 

.3891 

.411 

Waltham 

16 

57.875 

7.284 

.7360 

■*■-*• 
.6317 

1.194 

Springfield 

49 

57.122 

6.642 

.5352 

.6916 

.652 

S.E.  Mass 

64 

56.203 

5.027 

*•* 
.3717 

.6245 

.733 

Medford 

20 

56.900 

5.310 

.4548 

*-* 
.6945 

.856 

M.D.C. 

30 

55.367 

2.918 

.1873 

-** 
.5485 

.617 

Boston 

71 

57.901 

4.888 

.7128 

** 
.9018 

1.463 

New  Bedford 

22 

54.409 

3.581 

.1831 

.4626 

.501 

State 

47 

59.191 

5.940 

.5235 

-*■-*- 
.7203 

.909 

Worcester 

26 

57.923 

6.566 

** 
.6648 

.8056 

1.115 

TOTAL 

376 

57.186 

5.470 

-** 
.4905 

■*■•* 
.719 

.905b 

1 

df=  354 


df=  343   df=  343 


